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' This paper introduces a new approach to the study of rates of 

convergence for posterior distributions. It is a natural extension of 
t-H 1 a recent approach to the study of Bayesian consistency. In particu- 

lar, we improve on current rates of convergence for models including 
the mixture of Dirichlet process model and the random Bernstein 

' polynomial model. 

^3 , 1. Introduction. Recently, there have been many contributions to the 

theory of Bayesian consistency for infinite-dimensional models. Most of these 
adopt the "frequentist" (or "what if" ) approach, which consists of generating 
independent data from a "true" fixed density /o and checking whether the 

7— I ■ sequence of posterior distributions accumulates in Hellinger neighborhoods 

of /o- The determination of sufficient conditions for Hellinger consistency 

■ has been the main goal of a number of recent papers such as, for example, 
qq \ [1, 2, 5] and [12]. A summary is provided in [8]. Their results rely upon 

I ■ the use of uniformly consistent tests, combined with the construction of 

qq . suitable sieves and computation of metric entropies. An alternative method 

O \ for solving the problem can be found in [14], where a sufficient condition in 

terms of the summability of prior probabilities is provided. 
T". \ Here, we consider the allied problem of determining rates of convergence, 

■ that is, the determination of a sequence (e n )n>i such that e n [ and 

><; n re ({/:d(/,/ o )>M £n })^0 

a- 
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for any constant M > 0. The above-displayed convergence can be under- 
stood either as convergence in i^-probability or as almost sme-F^°, where 
Fq denotes the probability distribution associated with /o and Fq° is the 
infinite product distribution. Among recent papers dealing with this topic, 
we mention [4, 6, 7] and [13]. The key to these papers is the construction of 
a sieve and the use of entropies. The ultimate rate of convergence achieved 
depends on two quantities: the concentration rate, which depends on the 
prior mass assigned to suitable neighborhoods of /o, and the growth rate of 
the Hellinger entropy. A recent contribution, relying upon information the- 
ory, is given in [15]. The aim of the present paper is to tackle the problem 
based on the approach of Walker [14], which leads to improvements in the 
examples we consider. 

In Section 2, we first derive a useful bound for the posterior probability on 
the sets of interest and then prove a general theorem for the determination 
of rates, which relies upon two conditions. In Section 3, the normal mixture 
of Dirichlet process and random Bernstein polynomials are considered and 
currently known rates are improved. 

2. Posterior convergence rates. Consider a sequence of observations 
(X n )n>l, each taking values in some Polish space X endowed with the Borel 
d-algebra JT . If F indicates the space of probability density functions with 
respect to some cr-finite measure A on X, then the Hellinger metric h on F 
is defined by 



for any / and g in F, and we set to be the Borel cr-algebra of F. Sup- 
pose that II stands for a prior distribution on (F,J^"). Then the posterior 
distribution, given the observations (Xi, . . . ,X n ), coincides with 



for all B in & . We assume that there exists a "true" density function /q such 
that the A n 's are i.i.d. from f$. A sequence of posterior distributions IT n is 
said to be Hellinger consistent at /o if the posterior mass on sets of the type 
A e := {f ■ h(f, fo) > e} becomes negligible as the sample size n increases. 
The approach introduced in [14] relies upon the construction of a suitable 
covering of A e by Hellinger balls of radius 4> < e. The prior mass on these 
balls must be such that the sum of their square roots is finite. This entails 
consistency. Then, when dealing with rates, it is natural to refine the set A £ 
to A En ={f: h(f, f ) > e n } and to consider a covering {A n j : j = 1, 2, . . .} of 
A £n , where each A n j has radius <p n E (0,e n ). Consequently, we now define 




1/2 



n n (B) 



/ F nr=i/(^)n(d/) 



1/2 



3>1 
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a quantity that will be heavily relied on in this paper. 

Before stating the preliminary result, let us introduce some notation. Let 
Lq"- = Jli{A n j) and for any k > 1, let 



V J An H 



Moreover, set R k (f) = Ui=i f(X l )/f {X, i ) for every k = 1, 2, . . . . By exploit- 
ing the same martingale introduced in the proof of Theorem 4 in [14], one 
can show that the following holds. 

Proposition 1. Suppose that K £n < +00 and that 

(i) E e ~ ne l/s K £n <+™, 



n>l 

:2 



where (e n )n>i is 0, sequence such that e n — > and ne„ — > +00. Then 
(2) F °° (hmmfjE L%] < e~ n£ "/ w j ^ = 1. 

Proof. One can easily check that the identity 

L k+l,j/ L kJ = \J fk,Anj ( X k+l)/ /o(-Xfc+l) 

holds, given that 

fk,AnA x )=[ f(x)Rk(fMdf)/ f Rk(fMdf), k>l, 

J An,j " A n j 

represents the predictive distribution restricted to the set A n j, whereas 
f Q A n i is the marginal density of the single observation restricted to A n j . 
Let i£fc be the u-algebra generated by the observations X±, . . . , X k and note 
that 

E{L%l x .\& k ) = Ljg{l - h 2 (f k , An .J )/2}. 

Since h(f kj A nj ifj) < 8 n , where fj is any density in A n j, from the triangle 
inequality one has h(fk,A n ,-?/o) > £ n — <5 n = 7 n > 0. Hence, we fix fc = n to 
obtain 



Choose a sequence (%) n >i such that ?? n — > and nr/ n — ► +00. Apply Markov's 
inequality and the monotone convergence theorem to obtain 



^0 



\7>1 / \i>l / j>l 
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Then 

F o°° f E L S > e ~ n A < exp{-n{- log(l - J 2 J2) - Vn }} £ y/u^j) 

Vj>i / j>i 

= exp{-n{- log(l - 7n/2) - r)n}}K £n . 
Setting 4> n = e n /2 and % = e n /16, we have 

- log(l - 7^/2) - rjn > lll2 ~Vn = 4/16- 

Finally, condition (1) yields, by a straightforward application of the Borel- 
Cantelli lemma, the result in (2). □ 

Here we discuss a suitable lower bound for the denominator of the poste- 
rior, that is, I n := J ¥ R n (f)H(df). Regarding this point, previous contribu- 
tions provide bounds in probability rather than almost surely. Indeed, Shen 
and Wasserman [13] and Ghosal, Ghosh and van der Vaart [6] give results 
of the type 

I n > exp(—cne n ) in F^-probability 

for a constant c > 0, provided that II puts sufficient mass near fo, where 
closeness is measured through a combination of the Kullback-Leibler diver- 
gence and the L 2 (-Fo)-norm of log(/ //). IiK(f,f )=flog(f (x)/f(x))f Q (x)\(dx) 
and V(f,fo) = J {log(fo(x) / f (x))} 2 fo(x) \(dx) , then a neighborhood of the 
type above is defined as 

(3) B(e, f ) = {/ : K(f, f ) < e 2 , V(f, f ) < e 2 }. 

One can now prove the following result. 

Theorem 1. Suppose that e n ,5 n — ► and ne^^nb n — > +oo, and 

(i) e- n5 */ w K Sn 0; 

(ii) for some C > 0, U{B(e n , f )} > exp(— Cne 2 ). 

Then Ii n {A £n ) — > in Fq° -probability when 5 n < <pe n for some sufficiently 
small (j) > 0. 

Proof. Now, 

U n (A £ J < ]T U n (A nd ) < £ sjujj~) = e 4 ? ;]/ v^n 

j>l 3>1 j>l 

and so 

U n (A £n ) < exp[-n{4/i 6 + n~ l (\ogI n ) /2}} 
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in Fo°-probability. Moreover, by Lemma 8.1 in [6], condition (ii) implies that 

In > exp{-n(l + C)e 2 n } in ^-probability. 

Hence, n(e^/8 + n _1 log/ n ) — ► +oo in F^-probability when 5 n < 4>e n and 
(1 + C)(j) 2 < 1/8. The result follows. □ 

A sequence (e n )n>i satisfying (ii) in Theorem 1 is also referred to as prior 
concentration rate. As a simple illustration of condition (i), one can consider 
the discrete prior which puts mass 11^ on the density f k . If Ylk>i x/Hfe < 
+oo then K Sn is bounded by this sum and hence condition (i) reduces to 
e - ne n/ 16 _> o, which is trivially satisfied for e n = \ n j \fn for any A n — ► oo. 

Before moving on to consider specific priors, we need to modify the above 
results, relying on the technique of Lijoi, Priinster and Walker [9], which 
was developed for establishing consistency of the mixture of Dirichlet process 
model. Let N(5, S, d) denote the minimum number of balls of radius at most 
5, with respect to the metric d, needed to cover the space S. This is also 
known as the 5- covering number of S. Moreover, introduce a collection of 
sets {-B n ,fc : k > 1} which, for any n > 1, forms a partition of F. Accordingly, 
we denote by {A n>k j -j = 1, • • • , N{e n ,B n>k , h)} an e n -covering of B n>k with 
respect to the Hellinger distance h. Hence, one can easily check that 



/n(s n , fc ) 

(4) 



\ 



N(e n ,B n>k ,h) 

n(A nifejj ) 

3=1 



N(e n ,B n , k ,h) 



Jn(A n ,k,j) 



N(e n ,B n>k ,h) 



3=1 



Next, it is clear that the family {A n>k j :j = l,..., N{e n ,B n ^, h),k > 1} is a 
partition of F into sets of diameter, with respect to the Hellinger distance, 
at most e n . Finally, using (4), one can write 

N(e n ,B ntk ,h) 



K *n = E E V n (^n,fcj) < E N ( £ n,B n>k , h)^/lL(B njk ). 
k=l j=l k=l 

Hence, we are interested in establishing, for some sequence {e n )n>i such that 
e n — ► and ne n — > +oo as n — > oo , the validity of 



oo 

=■2 



e -^ n N(s n ,B ntk , h)^/U(B n>k ) 0. 
fc=i 

3. Illustrations. In the examples that follow, we show that the rate of 
convergence is governed by the concentration rate. In particular, we look at 
mixtures of Dirichlet processes and the random Bernstein polynomial model. 
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3.1. Normal mixture of Dirichlet process. The most widely used prior 
distribution for density estimation is undoubtedly the normal mixture of 
Dirichlet process (MDP) introduced by Lo [10] and later popularized by 
Escobar and West [3]. Such a random density function is given by 

(5) f aP {x) = 4> a *P = J <f> a (x - 9)P(d8), 

where the kernel <j) a is the density function of the normal distribution with 
mean zero and variance a 2 . Moreover, P is a Dirichlet process with parame- 
ter measure a(-), and a has a prior distribution which we denote by //. The 
issue of strong consistency for the model (5) has been studied in [5] and [9], 
whereas rates are determined in [7]. 

We focus on the case in which the support of /i coincides with the interval 
[cr, a], where < a < a < +oo, and we suppose that /o = 4> aQ * Po- This is the 
same setting considered in [7]: when either Po has compact support or a has 
sub-Gaussian tails, they achieve the best rate of (logn) K /y / n for k > 1. In 
particular, for the usual Gaussian tails for a, k = 3/2. Although these models 
allow the desirable prior concentration rate [condition (ii) of Theorem 1] of 
(logn) / '^/n, the worse entropy estimate determines their rate. 

We, on the other hand, can obtain the target rate (logn)/ \/n in more 
general models. To this end, we introduce sets of the type 

^la,s= U _{</>** P:P([-a,a])>l-S}, 

cr<<T<cr 

where a > 0. Moreover, recall that the inequality h 2 (f,g) < \\f — g\\i yields 

(6) N(V6,&*,h)<N(8,&*, ||- ||i) 

for any collection of density functions . Now, from [5] the upper bound 
for the L\ -metric entropy of set a 5 is given by 

log #1^, ||- ||i) <aC s , 

where C$ = Ka~ 1 d~ 1 log (1/(5) for some constant K. Hence, in view of (6), 
one finds that 

(7) logN(5,^ s2 ,h) < -^6~ 2 log(*). 

Now, for each n let (a n ,j)j>l be an increasing sequence of positive numbers 
such that a n j f +00 as j — > +00, and for j > 2 set 

^,^5 := U _{</>* *P:P([-a n , j+ nan, j+ i}) >^~C 

CT<cr<(J 

P([-a n j,a n j]) < 1 - 5 2 }, 
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while setting 



,i,« := U {^*-P:-P([-On,l»On,l])> 1 -^}- 

Such sets cover the support of the distribution of the mixture of Dirichlet 
process defined in (5). It is obvious that for j > 2, ^ a . $2 is included in 

&° A2 , so 



logN(5 n ^ anj ^h)<C sl a n , r 

This suggests that for each j > 2, < S a x2 has a finite Hellinger o^-covering 

{Cn,j,i :f = 1, • .. , iV n j}, where iV n j < [exp(C^2 a n j)] + 1 and [x] stands for 
the integer part of x > 0. Hence, setting 

B n ,j = {P : P([-a n j +1 ,a n j + i\) > 1 - ^,P([-a n j,a n! j]) < 1 - £ 2 } 

for j >2, one has 



j>2 j>2 

where is the law of the Dirichlet process with parameter a. If V n j := 
[— a nj -, a nj '] c , then B n j C {P : P(V n j) > <5 2 }. By the Markov inequality, 

@«{B n j) < 9 a {{P:P{V n ,j) > 5 2 n }) < ca{V n M 

for some constant c and, thus, one has 

K 5n < N nA + J2 cl/2,5 n 1 exp{a nij C 5 2 - a 2 nJ } 

when, as we assume, a([—a,a] c ) < exp(— 2a 2 ). Hence, if we put, for j > 2, 
a n,j = jCp , then the summand in the bound for K$ n is bounded by 

C 1/2 E^ 1/2 exp{-(j-l) 2 C|}, 

i>2 

which goes to zero as n — > +oo. Now, iV n) i is the Hellinger J n -covering num- 
ber of the set {P : P([— a n ,l, a n,l\) > 1 — <5 2 }- According to [7], it is the case 
that 

AW<£iexp{(log(l/,5 n )) 2 } 



when a Ut i < L2\/(log(l/5 n )) for constants L\ and L2. So, exp{— nS~}Kg n —> 
when exp{— nb\ + (log(l/<5 n )) 2 } — > 0, which occurs when nS 2 — (log(l/<5 n )) 2 - 
+00. Clearly, 5 n = M(\ogn)/y/n for some large enough M is sufficient. 
Hence, for example, we obtain an overall rate of convergence (log n) / y/n with 
normal a when the true mixing distribution Pq has sub-Gaussian tails. This 
improves on Ghosal and van der Vaart [7] , who obtain a rate of (log n) 3 / 2 / 'y/n 
in this case. 
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3.2. Random Bernstein polynomials. Another important prior for den- 
sity estimation is the so-called random Bernstein polynomial introduced in 
[11]. Such a random density admits the representation 

k 

b(x;k,F) = J2[FU/k)-F((j-l)/k)]f5(x;j,k-j + l), 

3=0 

where (5(x;a,b) is the beta density with parameters a, b > 0. In the previous 
representation, F is a random distribution function, usually chosen to be a 
Dirichlet process, and k has distribution p and is independent of F. Assum- 
ing /o is in the Kullback-Leibler support of the prior, consistency of such 
priors has been established in [12] and [14], where it has been shown that 
strong consistency holds under a suitable tail condition on p. Rates of con- 
vergence have been determined in [4]. In Theorem 2.3 of [4], it is shown that 
the prior concentration rate is (log re) 1 / 3 /?! 1 / 3 and that the entropy rate is 
(logn) 5 / 6 /?! 1 / 3 , thus leading to an overall convergence rate of (logn) 5 / 6 //! 1 / 3 . 

Following our bound for K £n in Section 2, define Bj to be the set of 
Bernstein polynomials of order j. Using the upper bound N(e n ,Bj,h) < 
(C/e n ) J provided by [4], we have 

K £n <J2(C/e n yVp-r + Y: E (C/enTV^- 

r=l j=lr=a ?l j+l 

Here we have introduced, for each re, an increasing sequence of reals (a n ,j)j>i 
which will be determined later on. Using the inequality Y^rLi ° r — Mc M for 
c > 1 , we have 

OO 

K £n < a nA (C/e n ) a ^ +]T(C/£ n r^+ 1 (a nj+ i -a nj )^~. 

3=1 

Here we have assumed that the p^s are decreasing for all large k and we 
will also assume that pk < exp(— 4/clog k) for all large k. Therefore, putting 
a n,j = Cj/e n , we have the summand term for the bound of K £n bounded by 

OO 

Ce- n l 5>xp{(j + l)Ce~ l \og{C/e n ) - 2jCe^ log(jC/e n )}, 

3=1 

which is bounded by Ce" 1 Y^jLi ex P{ — ^jCe" 1 logj}. In turn, this sum is 
bounded by D/e n as n — > +oo for some constant D; the term j = 1 ensures 
this. Returning to the first term in the bound for K £n , we are interested in 
finding e n for which e~ l exp{— ne^} — ► and exp{— ne^ + Ce~ l log(C/e n )} — > 
0, when ne 2 n — Ce" 1 log(C/e n ) — > +oo. This clearly happens when 

e n = M(logn) 1/3 /n 1/3 
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for sufficiently large M. Consequently, under the conditions of Theorem 
2.3 in [4], we obtain a rate of convergence of (logn) 1//3 /n 1 / 3 , which is the 
rate of convergence for the sieve MLE, whereas Ghosal [4] obtains a rate of 
(logn) 5 / 6 /?! 1 / 3 . Note that with lighter tails for p, namely pk < exp(— 2k 2 ), 
we can obtain a rate of (logn)/y / n for e n , but the overall rate will remain 
at (logn) 1 / 3 /?! 1 / 3 . 

Acknowledgments. We wish to thank Catia Scricciolo for helpful re- 
marks. We are grateful to two referees for comments that led to improve- 
ments and, in particular, to one referee who pointed out an inaccuracy in 
an earlier version of the paper. 
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